santoku - a visual introduction

David Hugh-Jones

2022-06-08

Santoku

A Japanese kitchen knife.

chopping skills

{santoku}

An R package for cutting data.

santoku logo

Some data

head(pts)
##      x   y
## 1  436 278
## 2  861 249
## 3  740 307
## 4  330 349
## 5  278 346
## 6 1327 239
x <- pts$x

Some data

plot_the_fish()

chop()

chop() is a replacement for base R’s cut() function.

chop()

chop(x, c(300, 600, 900))

extend = FALSE

chop(x, c(300, 600, 900), extend = FALSE)

Chopping by a single value

chop(x, c(300, 500, 500, 800))

chop_width()

Chops fixed-width intervals

chop_width(x, width = 200)

chop_evenly()

Chops intervals equal-width intervals

chop_evenly(x, intervals = 5)

chop_proportions()

Chops intervals by proportions of the data range

chop_proportions(x, proportions = c(0.2, 0.8))

chop_equally()

Chops intervals with an equal number of elements

chop_equally(x, groups = 5)

chop_n()

Chops intervals with a fixed number of elements

  • The last group may have fewer elements
chop_n(x, 100)

chop_quantiles()

chop_quantiles(x, c(0.2, 0.8))

Summary

Chop by: / Size means: number of elements width
Fixed size chop_n() chop_width()
Fixed no. of groups chop_equally() chop_evenly()
Specific sizes chop_quantiles() chop_proportions()

chop_mean_sd()

chop_mean_sd(x)

Quick tables

tab(x, c(300, 600, 900))
##   [35, 300)  [300, 600)  [600, 900) [900, 1384] 
##          14          53          48         132
tab_mean_sd(x)
## [-3 sd, -2 sd) [-2 sd, -1 sd)  [-1 sd, 0 sd)   [0 sd, 1 sd) 
##              6             54             52             80 
##   [1 sd, 2 sd) 
##             55

Changing labels

You need one more labels than breaks:

chop(x, c(300, 600, 900), labels = LETTERS[1:4])

Changing labels

Not sure how many intervals you will have?

Use a lbl_* function.

chop_width(x, 200, labels = lbl_seq())

Changing labels

Not sure how many intervals you will have?

Use a lbl_* function.

chop_width(x, 200, labels = lbl_seq("(i)"))

Changing labels

Not sure how many intervals you will have?

Use a lbl_* function.

chop_width(x, 200, labels = lbl_dash())

Left-closed and right-closed

Breaks are closed on the left by default.

chop(x, c(200, 500, 800))

Left-closed and right-closed

For right-closed breaks use left = FALSE:

chop(x, c(200, 500, 800), left = FALSE)

Errors

chopping fail

Errors

Sometimes it’s impossible to create the breaks you want.

chop_quantiles(c(-Inf, Inf), c(0.25, 0.75))
## [1] [-Inf, Inf ] [-Inf, Inf ]
## Levels: [-Inf, Inf ]

When the problem comes from the data (x), santoku will try to carry on (e.g. by returning a single interval).

When the problem comes from other parameters, e.g. breaks or extend, santoku will give an error.

chop_quantiles(1:5, c(0.25, NA))
## Error: probs contains 1 missing values

Happy chopping!

https://hughjonesd.github.io/santoku

devtools::install_github("hughjonesd/santoku")

Chopping